Stock Analysis of Tech companies &
Prediction using Long Short-Term Memory

Stock Analysis with Python

Here, we look at data from the stock market, particularly some technology stocks. We will learn how to use pandas to get stock information, visualize different aspects of it, and finally we will look at a few ways of analyzing the risk of a stock, based on its previous performance history.

What was the change in price of the stock overtime?

In this section, handle requesting stock information with pandas, and to analyze basic attributes of a stock.

Start with importing libraries pandas, numpy, matplotlib, seaborn & yfinance (to download data from online platform yahoo).

After that we download the stock data of all required tech companies(apple, google, microsoft, amazon) of 1 year with starting date = '2020-10-28' & ending date = '2021-10-28'. Then combine whole data set of each comapanies using zip() function, also to distinguih the data, a coloumn added which name companies name parallel to the dataset.

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
Open High Low Close Adj Close Volume company_name
Date
2020-10-27 115.489998 117.279999 114.540001 116.599998 115.685997 92276800 APPLE
2020-10-28 115.050003 115.430000 111.099998 111.199997 110.328323 143937800 APPLE
2020-10-29 112.370003 116.930000 112.199997 115.320000 114.416039 146129200 APPLE
2020-10-30 111.059998 111.989998 107.720001 108.860001 108.006683 190272600 APPLE
2020-11-02 109.110001 110.680000 107.320000 108.769997 107.917374 122866900 APPLE
2020-11-03 109.660004 111.489998 108.730003 110.440002 109.574287 107624400 APPLE
2020-11-04 114.139999 115.589996 112.349998 114.949997 114.048935 138235500 APPLE
2020-11-05 117.949997 119.620003 116.870003 119.029999 118.096947 126387100 APPLE
2020-11-06 118.320000 119.199997 116.129997 118.690002 117.962784 114457900 APPLE
2020-11-09 120.500000 121.989998 116.050003 116.320000 115.607300 154515300 APPLE

Summary Status using describe()

Open High Low Close Adj Close Volume
count 253.000000 253.000000 253.000000 253.000000 253.000000 2.530000e+02
mean 133.404348 134.758617 132.012845 133.432055 132.920135 9.275860e+07
std 11.002849 10.936415 11.051245 11.028159 11.171935 2.958864e+07
min 109.110001 110.680000 107.320000 108.769997 107.917374 4.639770e+07
25% 124.529999 126.150002 123.089996 124.970001 124.539215 7.243410e+07
50% 132.160004 133.750000 130.929993 132.029999 131.401062 8.766880e+07
75% 143.660004 144.889999 142.649994 143.759995 143.550491 1.077601e+08
max 156.979996 157.259995 154.389999 156.690002 156.461655 1.925415e+08

General information using info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 253 entries, 2020-10-27 to 2021-10-27
Data columns (total 7 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Open          253 non-null    float64
 1   High          253 non-null    float64
 2   Low           253 non-null    float64
 3   Close         253 non-null    float64
 4   Adj Close     253 non-null    float64
 5   Volume        253 non-null    int64  
 6   company_name  253 non-null    object 
dtypes: float64(5), int64(1), object(1)
memory usage: 15.8+ KB

Visualization using historical view of the closing price

Visualization using historical view of the total volume.

What was the moving average of the various stocks?

Let's move forward and look the moving average for 10, 20 & 50 days of various stocks. Firstly added the column for each moving average including days and parallely it's value via calculating mean of 'Adj Close' column. The following table shows the tail data of 10 rows.

                   Open         High          Low        Close    Adj Close  \
Date                                                                          
2021-10-14  2799.040039  2833.030029  2786.780029  2828.239990  2828.239990   
2021-10-15  2844.000000  2844.000000  2821.290039  2833.500000  2833.500000   
2021-10-18  2824.270020  2859.975098  2824.270020  2859.209961  2859.209961   
2021-10-19  2865.830078  2882.139893  2861.919922  2876.439941  2876.439941   
2021-10-20  2884.449951  2884.955078  2838.239990  2848.300049  2848.300049   
2021-10-21  2843.840088  2856.989990  2832.739990  2855.610107  2855.610107   
2021-10-22  2807.020020  2831.169922  2743.409912  2772.500000  2772.500000   
2021-10-25  2776.209961  2784.115967  2734.969971  2775.459961  2775.459961   
2021-10-26  2812.120117  2816.790039  2780.110107  2793.439941  2793.439941   
2021-10-27  2798.050049  2982.360107  2798.050049  2928.550049  2928.550049   

             Volume company_name  MA for 10 days  MA for 20 days  \
Date                                                               
2021-10-14  1071300       GOOGLE     2755.745020     2768.869006   
2021-10-15  1062500       GOOGLE     2766.170020     2769.080505   
2021-10-18   828200       GOOGLE     2784.561011     2773.023999   
2021-10-19   765800       GOOGLE     2799.851001     2777.199500   
2021-10-20   897000       GOOGLE     2809.972998     2778.676001   
2021-10-21   742500       GOOGLE     2817.163013     2779.630005   
2021-10-22  1509100       GOOGLE     2814.301001     2775.622009   
2021-10-25  1054100       GOOGLE     2814.152002     2772.894006   
2021-10-26  1412900       GOOGLE     2820.069995     2776.382007   
2021-10-27  2592500       GOOGLE     2837.125000     2788.288513   

            MA for 50 days  
Date                        
2021-10-14     2805.656802  
2021-10-15     2807.550801  
2021-10-18     2809.920601  
2021-10-19     2812.248599  
2021-10-20     2813.976001  
2021-10-21     2816.012402  
2021-10-22     2816.106602  
2021-10-25     2816.253398  
2021-10-26     2816.555796  
2021-10-27     2820.206597  

Ploting all the Moving Averages using matplotlib (subplot axes 2*2).

What was the daily return of the stock on average?

Till now only did some baseline analysis, let's go ahead and dive a little deeper. We're now going to analyze the risk of the stock. In order to do so we'll need to take a closer look at the daily changes of the stock, and not just its absolute value. Let's go ahead and use pandas to retrieve the daily returns for the Apple stock.

Steps will be, creating a new column contains value percentage of 'adj close' value. Then plotting again using subplot axes 2*2. (Here we used line style '--' & marker 'o'.

Below is an overall look at the average daily return using a histogram. We'll use seaborn to create both a histogram and kde plot on the same figure.

Steps will be, usinf foe lopp and plotting subplot, with seaborn (using displot),

What was the correlation between different stocks closing prices?

Building a DataFrame with all the ['Close'] columns for each of the stocks dataframes.

[*********************100%***********************]  4 of 4 completed
AAPL AMZN GOOG MSFT
Date
2020-10-27 115.686005 3286.330078 1604.260010 211.311005
2020-10-28 110.328331 3162.780029 1516.619995 200.837112
2020-10-29 114.416031 3211.010010 1567.239990 202.858566
2020-10-30 108.006676 3036.149902 1621.010010 200.629013
2020-11-02 107.917381 3004.479980 1626.030029 200.490295

Getting the daily return for all the stocks (by converting to percentage using pct.change()), like we did for the Apple stock.

AAPL AMZN GOOG MSFT
Date
2020-10-27 NaN NaN NaN NaN
2020-10-28 -0.046312 -0.037595 -0.054630 -0.049566
2020-10-29 0.037050 0.015249 0.033377 0.010065
2020-10-30 -0.056018 -0.054456 0.034309 -0.010991
2020-11-02 -0.000827 -0.010431 0.003097 -0.000691

Comparing the daily percentage return of two stocks to check how correlated. First looking a sotck compared to itself.

Steps will be, creating joinplot using seaborn under scatter kind category. (google vs google)

<seaborn.axisgrid.JointGrid at 0x21308334c40>

Creating joinplot using seaborn under scatter kind category. (google vs microsoft)

<seaborn.axisgrid.JointGrid at 0x21315d49ee0>

As if two stocks are perfectly (and positivley) correlated with each other a linear relationship bewteen its daily return values should occur.

Seaborn and pandas make it very easy to repeat this comparison analysis for every possible combination of stocks in our technology stock ticker list. We can use sns.pairplot() to automatically create this plot

Simply calling pairplot with kind 'reg' on our DataFrame for an automatic visual analysis of all the comparisons

<seaborn.axisgrid.PairGrid at 0x21327306970>

Calling pairplot with kind 'kde' format.

<seaborn.axisgrid.PairGrid at 0x21327f0aeb0>

Above did all the relationships on daily returns between all the stocks. A quick glance shows an interesting correlation between Google and Amazon daily returns. It might be interesting to investigate that individual comaprison.

While the simplicity of just calling sns.pairplot() is fantastic we can also use sns.PairGrid() for full control of the figure, including what kind of plots go in the diagonal, the upper triangle, and the lower triangle. Below is an example of utilizing the full power of seaborn to achieve this result

<seaborn.axisgrid.PairGrid at 0x21308be6ca0>

After switching upper triangle and lower triangle

<seaborn.axisgrid.PairGrid at 0x21309ab8d60>

Finally, we could also do a correlation plot, to get actual numerical values for the correlation between the stocks' daily return values. By comparing the closing prices, we see an interesting relationship between Microsoft and Apple.

Using heatmap from seaborn and input value 'tech_rets.corr()' and cmap of 'YlGnBu' .

<AxesSubplot:>

Using heatmap from seaborn and input value 'df_closing.corr()' and cmap of 'summer' .

<AxesSubplot:>

As suspected in our PairPlot, here numerically and visually that Microsoft and Amazon had the strongest correlation of daily stock return. It's also interesting to see that all the technology comapnies are positively correlated.

How much value do we put at risk by investing in a particular stock?

There are many ways to quantify risk, one of the most basic ways using the information we've gathered on daily percentage returns is by comparing the expected return with the standard deviation of the daily returns

Let's start by defining a new DataFrame as a clenaed version of the orignal tech_rets DataFrame (using dropna()).Then plotting scatter with 'expected return' x = rets.mean() & 'risk' y = rets.std() with area = 20*pi using numpy.

Here we use annotate to mark for better version.

Predicting the closing price stock price of GOOGLE inc

[*********************100%***********************]  1 of 1 completed
Open High Low Close Adj Close Volume
Date
2012-01-03 325.250885 332.827484 324.966949 331.462585 331.462585 7380561
2012-01-04 331.273315 333.873566 329.076538 332.892242 332.892242 5749470
2012-01-05 329.828735 330.745270 326.889740 328.274536 328.274536 6590410
2012-01-06 328.344299 328.767700 323.681763 323.796326 323.796326 5405987
2012-01-09 322.042908 322.291962 309.455078 310.067780 310.067780 11688849
... ... ... ... ... ... ...
2021-11-08 3000.000000 3020.689941 2982.399902 2987.030029 2987.030029 919400
2021-11-09 2994.919922 3007.570068 2950.139893 2984.969971 2984.969971 843800
2021-11-10 2960.195068 2974.000000 2906.500000 2932.520020 2932.520020 1135400
2021-11-11 2942.139893 2970.044922 2933.889893 2934.959961 2934.959961 623200
2021-11-12 2956.629883 2997.189941 2929.080078 2992.909912 2992.909912 852000

2484 rows × 6 columns

2360
array([[0.01951844],
       [0.02004513],
       [0.01834396],
       ...,
       [0.97775226],
       [0.97865114],
       [1.        ]])
[array([0.01951844, 0.02004513, 0.01834396, 0.01669418, 0.01163656,
       0.01176135, 0.01227886, 0.01295418, 0.01210085, 0.01275966,
       0.01355428, 0.01477647, 0.00494384, 0.00485759, 0.00401527,
       0.00191588, 0.0016608 , 0.00384093, 0.00342069, 0.00386479,
       0.00399691, 0.00478235, 0.00684137, 0.00918299, 0.00875724,
       0.00932247, 0.00961792, 0.00859943, 0.00975372, 0.00930595,
       0.0085352 , 0.00871137, 0.00836636, 0.01008404, 0.00897195,
       0.00863612, 0.00933164, 0.00922336, 0.01088966, 0.01086397,
       0.01162555, 0.01141451, 0.01012992, 0.00842509, 0.00876275,
       0.00882515, 0.00756074, 0.00845995, 0.01077773, 0.01044924,
       0.01139249, 0.01211002, 0.01375063, 0.01366071, 0.01485171,
       0.01596564, 0.01533068, 0.01656755, 0.01614364, 0.01774754])]
[0.01639871859458371]

[array([0.01951844, 0.02004513, 0.01834396, 0.01669418, 0.01163656,
       0.01176135, 0.01227886, 0.01295418, 0.01210085, 0.01275966,
       0.01355428, 0.01477647, 0.00494384, 0.00485759, 0.00401527,
       0.00191588, 0.0016608 , 0.00384093, 0.00342069, 0.00386479,
       0.00399691, 0.00478235, 0.00684137, 0.00918299, 0.00875724,
       0.00932247, 0.00961792, 0.00859943, 0.00975372, 0.00930595,
       0.0085352 , 0.00871137, 0.00836636, 0.01008404, 0.00897195,
       0.00863612, 0.00933164, 0.00922336, 0.01088966, 0.01086397,
       0.01162555, 0.01141451, 0.01012992, 0.00842509, 0.00876275,
       0.00882515, 0.00756074, 0.00845995, 0.01077773, 0.01044924,
       0.01139249, 0.01211002, 0.01375063, 0.01366071, 0.01485171,
       0.01596564, 0.01533068, 0.01656755, 0.01614364, 0.01774754]), array([0.02004513, 0.01834396, 0.01669418, 0.01163656, 0.01176135,
       0.01227886, 0.01295418, 0.01210085, 0.01275966, 0.01355428,
       0.01477647, 0.00494384, 0.00485759, 0.00401527, 0.00191588,
       0.0016608 , 0.00384093, 0.00342069, 0.00386479, 0.00399691,
       0.00478235, 0.00684137, 0.00918299, 0.00875724, 0.00932247,
       0.00961792, 0.00859943, 0.00975372, 0.00930595, 0.0085352 ,
       0.00871137, 0.00836636, 0.01008404, 0.00897195, 0.00863612,
       0.00933164, 0.00922336, 0.01088966, 0.01086397, 0.01162555,
       0.01141451, 0.01012992, 0.00842509, 0.00876275, 0.00882515,
       0.00756074, 0.00845995, 0.01077773, 0.01044924, 0.01139249,
       0.01211002, 0.01375063, 0.01366071, 0.01485171, 0.01596564,
       0.01533068, 0.01656755, 0.01614364, 0.01774754, 0.01639872])]
[0.01639871859458371, 0.015082937419806333]

2300/2300 [==============================] - 91s 38ms/step - loss: 5.5100e-04
<keras.callbacks.History at 0x21315db48e0>
112.46023745316838
C:\Users\Vanshika\AppData\Local\Temp/ipykernel_49168/1099623343.py:4: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  valid['Predictions'] = predictions
Close Predictions
Date
2021-05-20 2356.090088 2234.260254
2021-05-21 2345.100098 2244.388916
2021-05-24 2406.669922 2253.925293
2021-05-25 2409.070068 2272.273193
2021-05-26 2433.530029 2290.957520
... ... ...
2021-11-08 2987.030029 2845.677490
2021-11-09 2984.969971 2859.204590
2021-11-10 2932.520020 2869.168701
2021-11-11 2934.959961 2866.656738
2021-11-12 2992.909912 2860.563965

124 rows × 2 columns

Stock Market Data Visualization and Analysis

After you have the stock market data, the next step is to create trading strategies and analyse the performance. The ease of analysing the performance is the key advantage of the Python.

We will analyse the cumulative returns, drawdown plot, different ratios such as

Sharpe ratio, Sortino ratio, and Calmar ratio.

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
C:\python 37\lib\site-packages\pyfolio\pos.py:26: UserWarning: Module "zipline.assets" not found; mutltipliers will not be applied to position notionals.
  warnings.warn(
Start date2016-11-15
End date2021-11-12
Total months59
Backtest
Annual return 40.6%
Cumulative returns 448.5%
Annual volatility 25.0%
Sharpe ratio 1.49
Calmar ratio 1.54
Stability 0.96
Max drawdown -26.3%
Omega ratio 1.32
Sortino ratio 2.17
Skew -0.22
Kurtosis 6.74
Tail ratio 0.92
Daily value at risk -3.0%

Data Analysis

We see that volume traded and closing price have an inverse relationship. This relationship is a common practice in finance. If the closing price of a stock decreases, people are more likely to trade a particular stock. However, we see that the data is very spiky. This spikiness is because there are subtle market forces that guide the price fluctuations.

Next, we can use an OHLC chart to visualize the data. The OLHC (open, high, low and close) chart is a financial chart describing open, high, low and close values for a given date.

The horizontal segments represent open and close values, and the tip of the lines represents the low and high values. Points, where the close value is higher than open are called increasing (in green) and decreasing close value is lower than open( in red).

Stock Analysis with Python